Search CORE

10 research outputs found

Using hybrid shared and distributed caching for mixed-coherency GPU workloads

Author: Anssari Nasser
Publication venue
Publication date: 01/12/2012
Field of study

Current GPU computing models support a mixture of coherent and incoherent classes of memory operations. Workloads using these models typically have working sets too large to fit in an economical SRAM structure. Still, GPU architectures have last-level caches to primarily fulfill two functions: eliminate redundant DRAM accesses servicing requests from different L1 caches to the same line, and maintain on-chip memory coherence for the coherent class of memory operations. In this thesis, we propose an alternative memory system design for GPU architectures better fit for their workloads. Our architectural design features a directory-like sharing tracker that allows the incoherent private L1 caches to directly satisfy remote requests for shared data. It also retains a shared L2 cache with a customized caching policy to support coherent accesses on-chip and better serve non-coalesced requests that contend aggressively for cache lines. This thesis characterizes the novel and intriguing tradeoffs between the components of our proposed memory system design for area, energy, and performance. We show that the proposed design achieves a 22% average reduction in DRAM data demand over a standard GPU architecture with 1MB L2 cache, leading to an overall 28% reduction in the memory system energy consumption on average. Conversely, our results show that the DRAM data demand of the proposed design with 256KB L2 cache is on par with a standard GPU architecture with 1MB L2 cache, albeit at a smaller area overhead and power leakage. Our results, while drawn on motivations from the GPU realm, are not architecture-specific and can be extended to other throughput-oriented many-core organizations

Illinois Digital Environment for Access to Learning and Scholarship Repository

Novel Moment Features Extraction for Recognizing Handwritten Arabic Letters

Author: Gheith Abandah
Nasser Anssari
Publication venue: 'Science Publications'
Publication date
Field of study

Crossref

A new system to analyze pulsatile flow characteristics in elastic tubes for hemodynamic applications

Author: Anssari-Benam Afshin
Fatouraee Nasser
Khani Mohammad M.
Pashaiee Ali
Tafazzoli-Shadpour Mohammad
Publication venue: 'Science Publications'
Publication date: 31/12/2008
Field of study

Portsmouth University Research Portal (Pure)

Parallel implementation of multi-dimensional ensemble empirical mode decomposition

Author: Ke-hsin Hsu
Li-wen Chang
Men-tzung Lo
Nasser Anssari
Norden E. Huang
Wen-mei W. Hwu
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we propose and evaluate two parallel implementations of Multi-dimensional Ensemble Empirical Mode Decomposition (MEEMD) for multi-core (CPU) and many-core (GPU) architectures. Relative to a sequential C implementation, our double precision GPU implementation, using the CUDA programming model, achieves up to 48.6x speedup on NVIDIA Tesla C2050. Our multi-core CPU implementation, using the OpenMP programming model, achieves up to 11.3x speedup on two octal-core Intel Xeon x7550 CPUs

CiteSeerX

Crossref

Data Layout Transformation Exploiting Memory-Level Parallelism in Structured Grid Many-Core Applications

Author: C.D. Gundolf
C.D. Spradling
E. Ipek
I-Jui Sung
J.H. Ferziger
J.M. Anderson
J.W. Demmel
John A. Stratton
K.W. Morton
M.E. Mace
N.R. Mahapatra
Nasser Anssari
O. Mutlu
S Sellappa
S. Girbal
T. Pohl
Wen-Mei W. Hwu
Y. Zhao
Y.H. Qian
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref